Assessing the Impact of Cache Injection on Parallel Application Performance

نویسنده

  • Edgar A. León
چکیده

The memory wall [13], the continuing disparity between processor and memory speeds, adversely affects the performance of many applications [8], particularly data intensive computations [11]. Cache injection addresses this disparity by placing incoming network data into a processor’s cache directly from the I/O bus. The effectiveness of this technique on application performance is dependent on several factors including the ratio of processor to memory speed, the NIC’s injection policy, and the application’s communication characteristics. In this work, I show that the performance of cache injection is directly proportional to the ratio of processor to memory speed, i.e., the higher the memory wall, the higher the performance. This result suggests that cache injection can be particularly effective on multi-core architectures (increasing number of cores compared to available channels to memory). Unlike previous work (focused on reducing memory copies incurred by the network stack) [1, 5], I show that cache injection improves application performance by leveraging the application’s temporal and spatial locality in accessing incoming network data. In addition, the application’s communication characteristics are key to cache injection performance. For example, cache injection can improve the performance of certain collective operations by 20%. Cache injection [1, 5, 3] is one of several techniques to alleviate the memory wall [9, 10, 8]. This technique reduces data latency and memory pressure (the number of requests issued to the memory controller per unit of time) by placing incoming network data directly into the cache. In current architectures, incoming network data is written to the system’s main memory and cached copies are invalidated. Cache injection replaces invalidate operations with updates if the corresponding cache lines are present, or allocate operations if they are not. In the next section, I describe how cache injection compares to a widely-used technique to reduce data latency, namely data prefetching. Unlike prefetching, cache injection significantly reduces memory pressure for I/O. Prefetching is driven by the access patterns of the processor (consumer of data), while cache injection is driven by the NIC (producer). This producer-initiated model makes cache injection prone to cache pollution. In Section 3, I show an example of this problem, and describe injection policies that determine what and when to inject into the cache to minimize pollution. In Section 4, I characterize application sensitivity to cache injection. In particular, I show that the performance of this technique is dependent on the degree to which systems are affected by the memory wall, the injection policy, and the application’s communication characteristics. Finally, I conclude in Section 5.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of Oxytocin-Milking Method on Lactation Performance and Lactation Length of Sheep

Determination of milk yield potential using an accurate method is essential for assessing nutritional requirements, evaluation of genetic potential, lamb growth and survival, management decisions and improving performance traits of sheep flocks. Exogenous oxytocin injection is applied to estimate milk secretion rate in sheep. Oxytocin is a neurohormone produced in the hypothalamo-neurohypophysi...

متن کامل

Reducing Memory Bandwidth for Chip- Multiprocessors using Cache Injection

Current and future high-performance systems will be constructed using multi-core chips. These systems impose higher demands to the memory system. Lack of adequate memory bandwidth will limit application performance. To reduce memory bandwidth we propose to use cache injection of incoming network messages. The objective of this work is to demonstrate benefits of cache injection and provide a bas...

متن کامل

Characterization of L3 Cache Behavior of Java Application Server

This paper investigates the performance of L3 cache of Java Application server, taking SPECjAppServer2002 as the representative workload. Shared L3 cache with sizes ranging from 4M to 1G are simulated utilizing the Programmable HardwareAssisted Cache Emulator (PHA$E). Additionally, the impact of heap size and garbage collection method on the behavior of the L3s under study is analyzed. Heap siz...

متن کامل

The Impact of Message Tra c on Multicomputer

Multicomputer cache performance is highly sensitive to interprocessor message traac. The widening gap between microprocessor speeds and primary memory latencies means slight increases in cache miss rate can have a severe impact on application performance. It is therefore critical to reduce cache misses. While there are a number of factors that may contribute to an increase in cache misses, of p...

متن کامل

Hybrid Shared-aware Cache Coherence Transition Strategy

Chip-multiprocessors have played a significant role in real parallel computer architecture design. For integrating tens of cores into a chip, designs tend towards with physically distributed last level caches. This naturally results in a Non-Uniform Cache Access design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009